Embarrassingly parallel sequential Markov-chain Monte Carlo for large sets of time series
نویسندگان
چکیده
Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines “divide and conquer” ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.
منابع مشابه
Individual adaptation: an adaptive MCMC scheme for variable selection problems
The increasing size of data sets has lead to variable selection in regression becoming increasingly important. Bayesian approaches are attractive since they allow uncertainty about the choice of variables to be formally included in the analysis. The application of fully Bayesian variable selection methods to large data sets is computationally challenging. We describe an adaptive Markov chain Mo...
متن کاملA Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) al...
متن کاملPost - Inference Methods for Scalable Probabilistic Modeling
Post-Inference Methods for Scalable Probabilistic Modeling by Willie Neiswanger This thesis focuses on post-inference methods, which are procedures that can be applied after the completion of standard inference algorithms to allow for increased efficiency, accuracy, or parallelism when learning probabilistic models of big data sets. These methods also aim to allow for efficient computation give...
متن کاملAsymptotically Exact, Embarrassingly Parallel MCMC
Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, a...
متن کاملEmbarrassingly Parallel Acceleration of Global Tractography via Dynamic Domain Partitioning
Global tractography estimates brain connectivity by organizing signal-generating fiber segments in an optimal configuration that best describes the measured diffusion-weighted data, promising better stability than local greedy methods with respect to imaging noise. However, global tractography is computationally very demanding and requires computation times that are often prohibitive for clinic...
متن کامل